eptides will not maintain any amino acid composition pattern or
t a random distribution. In other words, each of 20 amino acids
ame probability to appear at each residue of a non-cleaved peptide.
e, it is expected that the homology scores of cleaved peptides and
ology scores of non-cleaved peptides should show different
in theory.
d on the above analysis, BBFNN is designed in the following
ppose there are K bio-bases. A non-numerical peptide is then
to a K-dimensional space, i.e., ࣜሺܠ, ܛଵሻ, ࣜሺܠ, ܛଶሻ, ⋯, and
ሻ. A linear function is generated for combining K bio-basis
through the weighting parameters ݓ,
ݓࣜሺܠ, ܛሻ
ୀଵ
(3.42)
xpected that this linear combination of the bio-basis functions
a bimodal distribution if weights (ݓ) have been well-estimated.
modal distribution is expected to fit the classification of peptides
of peptide status (ݕ). A linear classifier can then be built in this
sional bio-basis function space,
ݕොൌݓࣜሺܠ, ܛሻ
ୀଵ
(3.43)
ose X is a collection of peptides and S is a matrix of all peptides
o this K-dimensional bio-basis function space. The matrix S has
or N peptides and K columns for K bio-bases. A target vector of
tion labels is denoted by y,
ܡൌሺݕଵ, ݕଶ, ⋯, ݕேሻ௧
(3.44)
y, a weight vector is represented by w,
ܟൌሺݓଵ, ݓଶ, ⋯, ݓሻ௧
(3.45)